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PROBABILITY THEORY: BASIC 
NOTIONS 

All epistemologic value of the theory of probability is based 
on this: that large scale random phenomena in their collective 

action create strict, non random regularity. 

(Gnedenko et Kolmogorov, Limit Distributions for Sums of 
Independent Random Variables.) 

1.1 Introduction 

Randomness stems from our incomplete knowledge of reality, from the 

lack of information which forbids a perfect prediction of the future: ran- 
domness arises from complexity, from the fact that causes are diverse, 
that tiny perturbations may result in large effects. For over a century 
now, Science has abandoned Laplace's deterministic vision, and has fully 
accepted the task of deciphering randomness and inventing adequate 
tools for its description. The surprise is that, after all, randomness has 
many facets and that there arc many levels to uncertainty, but, above 
all, that a new form of predictability appears, which is no longer deter- 
ministic but statistical. 

Financial markets offer an ideal testing ground for these statistical 
ideas: the fact that a large number of participants, with divergent an- 
ticipations and conflicting interests, are simultaneously present in these 
markets, leads to an unpredictable behaviour. Moreover, financial mar- 
kets are (sometimes strongly) affected by external news - which are, 
both in date and in nature, to a large degree unexpected. The statistical 
approach consists in drawing from past observations some information 
on the frequency of possible price changes, and in assuming that these 
frequencies reflect some intimate mechanism of the markets themselves, 
implying that these frequencies will remain stable in the course of time. 
For example, the mechanism underlying the roulette or the game of dice 
is obviously always the same, and one expects that the frequency of all 
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possible outcomes will be invariant in time - although of course each 
individual outcome is random. 

This 'bet' that probabilities are stable (or better, stationary) is very 
reasonable in the case of roulette or dice;^ it is nevertheless much less 
justified in the case of financial markets - despite the large number of 
participants which confer to the system a certain regularity, at least in 
the sense of Gnedenko and Kolmogorov.lt is clear, for example, that fi- 
nancial markets do not behave now as they did thirty years ago: many 
factors contribute to the evolution of the way markets behave (devel- 
opment of derivative markets, worldwide and computer-aided trading, 
etc.). As will be mentioned in the following, 'young' markets (such as 
emergent countries markets) and more mature markets (exchange rate 
markets, interest rate markets, etc.) behave quite differently. The sta- 
tistical approach to financial markets is based on the idea that whatever 
evolution takes place, this happens sufficiently slowly (on the scale of sev- 
eral years) so that the observation of the recent past is useful to describe 
a not too distant future. However, even this 'weak stability' hypothesis 
is sometimes badly in error, in particular in the case of a crisis, which 
marks a sudden change of market behaviour. The recent example of 
some Asian currencies indexed to the dollar (such as the Korean won or 
the Thai baht) is interesting, since the observation of past fiuctuations 
is clearly of no help to predict the sudden turmoil of 1997 - see Fig. 1.1. 

Hence, the statistical description of financial fluctuations is certainly 
imperfect. It is nevertheless extremely helpful: in practice, the 'weak 
stability' hypothesis is in most cases reasonable, at least to describe 
risks^ 

In other words, the amplitude of the possible price changes (but 
not their sign!) is, to a certain extent, predictable. It is thus rather 
important to devise adequate tools, in order to control (if at all possible) 
financial risks. The goal of this first chapter is to present a certain 
number of basic notions in probability theory, which we shall find useful 
in the following. Our presentation does not aim at mathematical rigour, 
but rather tries to present the key concepts in an intuitive way, in order 
to ease their empirical use in practical applications. 



1.2 Probabilities 



^The idea that Science ultimately amounts to making the best possible guess of 
reality is due to R. P. Feynman, ('Seeking New Laws', in The Character of Physical 
Laws, MIT Press, 1967). 

^The prediction of future returns on the basis of past returns is however much less 
justified. 
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1.2 
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Figure 1.1: Three examples of statistically unforseen crashes: the Korean won 
against the dollar in 1997 (top), the British 3 month short term interest rates 

futures in 1992 (middle), and the S&P 500 in 1987 (bottom). In the exam- 
ple of the Korean Won, it is particularly clear that the distribution of price 
changes before the crisis was extremely narrow, and could not be extrapolated 
to anticipate what happened in the crisis period. 
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1.2.1 Probability distributions 



Contrarily to the throw of a dice, which can only return an integer be- 
tween 1 and 6, the variation of price of a financial assetj^can be arbitrary 
(we disregard the fact that price changes cannot actually be smaller than 
a certain quantity - a 'tick'). In order to describe a random process X 
for which the result is a real number, one uses a probability density P{x), 
such that the probability that X is within a small interval of width dx 
around X = x is equal to P{x)dx. In the following, we shall denote 
as P{.) the probability density for the variable appearing as the argu- 
ment of the function. This is a potentially ambiguous, but very useful 
notation. 

The probability that X is between a and b is given by the integral of 
P{x) between a and &, 



In the following, the notation V{.) means the probability of a given event, 
defined by the content of the parenthesis (.). 

The function P{x) is a density; in this sense it depends on the units 
used to measure X. For example, if X is a length measured in centime- 
tres, P{x) is a probability density per unit length, i.e. per centimetre. 
The numerical value of P{x) changes if X is measured in inches, but the 
probability that X lies between two specific values h and I2 is of course 
independent of the chosen unit. P{x)dx is thus invariant upon a change 
of unit, i.e. under the change of variable x — > ^x. More generally, P{x)dx 
is invariant upon any (monotonous) change of variable x ~* y{x): in this 
case, one has P{x)dx = P{y)dy. 

In order to be a probability density in the usual sense, P{x) must be 
non negative {P{x) > for all x) and must be normalised, that is that 
the integral of P{x) over the whole range of possible values for X must 
be equal to one: 



where Xm (resp. xm) is the smallest value (resp. largest) which X can 
take. In the case where the possible values of X are not bounded from 
below, one takes Xm — —00, and similarly for xm- One can actually 
always assume the bounds to be ± cx3 by setting to zero P{x) in the 
intervals ] — 00, Xm] and [xa/,oo[. Later in the text, we shall often use 
the symbol J shorthand for /_ 

Asset is the generic name for a financial instrument which can be bought or sold, 
like stocks, currencies, gold, bonds, etc. 




(1.1) 




(1.2) 
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An equivalent way of describing the distribution of X is to consider 
its cumulative distribution 7'<(a;), defined as: 

V<{x)=V{X <x) = I P{x')dx'. (1.3) 

V<: {x) takes values between zero and one, and is monotonously increasing 
with X. Obviously, 7'<(— oo) = and 7^<(+oo) = 1. Similarly, one 
defines V>{x) = 1-V<{x). 



1.2.2 Typical values and deviations 

It is rather natural to speak about 'typical' values of X. There are at 
least three mathematical definitions of this intuitive notion: the most 
probable value, the median and the mean. The most probable value 
X* corresponds to the maximum of the function P{x); x* needs not be 
unique if P{x) has several equivalent maxima. The median Xyacd is such 
that the probabilities that X be greater or less than this particular value 
are equal. In other words, 'P<(a;mod) — 'P>{xaicd) — \- The mean, or 
expected value of X, which we shall note as m or {x) in the following, is 
the average of all possible values of X , weighted by their corresponding 
probability: 



xP{x)dx. (1.4) 

For a unimodal distribution (unique maximum) , symmetrical around this 
maximum, these three definitions coincide. However, they are in general 



different, although often rather close to one another. Figure 1.2 shows 
an example of a non symmetric distribution, and the relative position of 
the most probable value, the median and the mean. 

One can then describe the fluctuations of the random variable X: 
if the random process is repeated several times, one expects the results 
to be scattered in a cloud of a certain 'width' in the region of typical 
values of X. This width can be described by the mean absolute deviation 
(mad) i?abs, by the root mean square (rms) a (or, in financial terms, the 
volatility ), or by the 'full width at half maximum' Wi/2. 

The mean absolute deviation from a given reference value is the av- 
erage of the distance between the possible values of X and this reference 
value ,0 



Ea,hs= / \x - Xir,cd\P{x)dx. (1.5) 



*One chooses as a reference value the median for the MAD and the mean for the 
RMS, because for a fixed distribution P{x), these two quantities minimise, respectively, 
the MAD and the RMS. 
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Figure 1.2: The 'typical value' of a random variable X drawn according to 
a distribution density P{x) can be defined in at least three different ways: 
through its mean value (x), its most probable value x* or its median xmed. In 
the general CEise these three values are distinct. 



Similarly, the variance (cr^) is the mean distance squared to the reference 
value m, 

0-2 = {{x - mf) = mfP{x)dx. (1.6) 

Since the variance has the dimension of x squared, its square root (the 
RMS a) gives the order of magnitude of the fluctuations around m. 

Finally, the full width at half maximum ■u'1/2 is defined (for a distri- 
bution which is symmetrical around its unique maximum x*) such that 
P{x* ± ^^^^ ) — "^'•^ , which corresponds to the points where the prob- 
ability density has dropped by a factor two compared to its maximum 
value. One could actually define this width slightly differently, for exam- 
ple such that the total probability to find an event outside the interval 
[x* - ^,x* + ^] is equal to - say - 0.1. 

The pair mean- variance is actually much more popular than the pair 
median-MAD. This comes from the fact that the absolute value is not 
an analytic function of its argument, and thus does not possess the nice 
properties of the variance, such as additivity under convolution, which 
we shall discuss below. However, for the empirical study of fluctuations, 
it is sometimes preferable to use the mad; it is more robust than the 
variance, that is, less sensitive to rare extreme events, source of large 
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statistical errors. 



1.2.3 Moments and characteristic function 

More generally, one can define higher order moments of the distribution 
P{x) as the average of powers of X: 



x''P{x)dx. (1.7) 

Accordingly, the mean m is the first moment (n = 1), while the variance 
is related to the second moment (cr^ = TO2 — m^). The above definition 



(1.7) is only meaningful if the integral converges, which requires that 
P{x) decreases sufficiently rapidly for large \x\ (see below). 

From a theoretical point of view, the moments are interesting: if 
they exist, their knowledge is often equivalent to the knowledge of the 
distribution P[x) itself]^ In practice however, the high order moments 
are very hard to determine satisfactorily: as n grows, longer and longer 
time series are needed to keep a certain level of precision on m„; these 
high moments are thus in general not adapted to describe empirical data. 

For many computational purposes, it is convenient to introduce the 
characteristic function of P{x), defined as its Fourier transform: 

P{z) = [ e"'-'P{x)dx. (1.8) 



The function P{x) is itself related to its characteristic function through 
an inverse Fourier transform: 

P{x) - — / e-*^^P(z)dz. (1.9) 

271 J 

Since P{x) is normalised, one always has P(0) = 1. The moments of 
P{x) can be obtained through successive derivatives of the characteristic 
function at z = 0, 



d7J- 



(1.10) 

2 = 



One finally define the cumulants Cn of a distribution as the successive 
derivatives of the logarithm of its characteristic function: 



{-ir^\ogp{z) 



(1.11) 



^This is not rigourously correct, since one can exhibit exam ples o f different distri- 



bution densities which possess exactly the same moments: see L.3.2 below 
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The cumulant c„ is a polynomial combination of the moments nip with 
p < n. For example C2 = m2 — m? — . It is often useful to nor- 
malise the cumulants by an appropriate power of the variance, such that 
the resulting quantity is dimensionless. One thus define the normalised 
cumulants A„, 

A„ = c„/a". (1.12) 

One often uses the third and fourth normalised cumulants, called the 
skewness and kurtosis (k),^ 

{{x-mf) _ {{x-mf) 

A3 = ^ K = A4 = J 6. (l-lo) 

The above definition of cumulants may look arbitrary, but these quanti- 
ties have remarkable properties. For example, as we shall show in Section 



1.5, the cumulants simply add when one sums independent random vari- 
ables. Moreover a Gaussian distribution (or the normal law of Laplace 
and Gauss) is characterised by the fact that all cumulants of order larger 
than two are identically zero. Hence the cumulants, in particular k, can 
be interpreted as a measure of the distance between a given distribution 
P{x) and a Gaussian. 

1.2.^ Divergence of moments - Asymptotic behaviour 

The moments (or cumulants) of a given distribution do not always exist. 
A necessary condition for the n*"^ moment (rn„) to exist is that the 
distribution density P{x) should decay faster than l/lxl""^^ for |a;| going 
towards infinity, or else the integral (^^) would diverge for \x\ large. If 
one restricts to distribution densities behaving asymptotically as a power 
law, with an exponent 1 -I- /i, 

nx)-j^^iorx^±c^^ (1.14) 

then all the moments such that n > fj, are infinite. For example, such a 
distribution has no finite variance whenever /i < 2. [Note tha t, fo r P{x) 
to be a normalisable probability distribution, the integral (1.2) must 
converge, which requires /i > 0.] 

The characteristic function of a distribution having an asymptotic 



power law behaviour given by ) is non analytic around z = 0. The 



small z expansion contains regular terms of the form z" for n < fi 
followed by a non analytic term \z\'^ (possibly with logarithmic correc- 
tions such as \z\'^ log z for integer jr). The derivatives of order larger or 
equal to fi of the characteristic function thus do not exist at the origin 
(z = 0). 



^Note that it is sometimes k-I-3, rather than k itself, which is called the kurtosis. 
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1.3 Some useful distributions 



1.3.1 Gaussian distribution 



The most commonly encountered distributions are the 'normal' laws of 
Laplace and Gauss, which we shall simply call in the following Gaus- 
sians. Gaussians are ubiquitous: for example, the number of heads in a 
sequence of a thousand coin tosses, the exact number of oxygen molecules 
in the room, the height (in inches) of a randomly selected individual, are 
all approximately described by a Gaussian distribution.^ The ubiquity 
of the Gaussian can be in part traced to the Central Limit Theorem 
(clt) discussed at length below, which states that a phenomenon result- 
ing from a large number of small independent causes is Gaussian. There 
exists however a large number of cases where the distribution describing 
a complex phenomenon is not Gaussian: for example, the amplitude of 
earthquakes, the velocity differences in a turbulent fluid, the stresses in 
granular materials, etc., and, as we shall discuss in next chapter, the 
price fluctuations of most financial assets. 

A Gaussian of mean m and root mean square a is defined as: 



The median and most probable value are in this case equal to m, while 
the MAD (or any other definition of the width) is proportional to the RMS 



(for example, -Babs = tT-y/2/7r). For m = 0, all the odd moments are zero 




while the even moments are given by m2n — (27^ — l)(2n — S)...^^" = 



All the cumulants of order greater than two are zero for a Gaussian. 
This can be realised by examining its characteristic function: 



Its logarithm is a second order polynomial, for which all derivatives of 
order larger than two are zero. In particular, the kurtosis of a Gaussian 
variable is zero. As mentioned above, the kurtosis is often taken as a 
measure of the distance from a Gaussian distribution. When k > 
{leptokurtic distributions), the corresponding distribution density has a 
marked peak around the mean, and rather 'thick' tails. Conversely, when 
K < 0, the distribution density has a flat top and very thin tails. For 

'^Although, in the above three examples, the random variable cannot be negative. 
As we shall discuss below, the Gaussian description is generally only valid in a certain 
neighbourhood of the maximum of the distribution. 




(1.15) 




(2n- 1)!! CT^n. 




(1.16) 
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example, the uniform distribution over a certain interval (for which tails 
are absent) has a kurtosis k = — |. 

A Gaussian variable is peculiar because 'large deviations' are ex- 
tremely rare. The quantity exp(— x^/2(t^) decays so fast for large x that 
deviations of a few times a are nearly impossible. For example, a Gaus- 
sian variable departs from its most probable value by more than 2a only 
5% of the times, of more than 3(t in 0.2% of the times, while a fluctuation 
of 10(7 has a probability of less than 2 x 10"^'^; in other words, it never 
happens. 



1.3.2 Log-normal distribution 

Another very popular distribution in mathematical finance is the so- 
called 'log-normal' law. That X is a log-normal random variable simply 
means that \ogX is normal, or Gaussian. Its use in finance comes from 
the assumption that the rate of returns^ rather than the absolute change 
of prices, are independent random variables. The increments of the 
logarithm of the price thus asymptotically sum to a Gaussian, according 
to the CLT detailed below. The log-normal distribution density is thus 
defined as:^ 

2 2 

the moments of which being: m„ — x^e" ' . 

In the context of mathematical finance, one often prefers log-normal 
to Gaussian distributions for several reasons. As mentioned above, the 
existence of a random rate of return, or random interest rate, naturally 
leads to log-normal statistics. Furthermore, log-normals account for the 
following symmetry in the problem of exchange ratesQ if x is the rate of 
currency A in terms of currency B, then obviously, 1/ x is the rate of cur- 
rency B in terms of A. Under this transformation, logcc becomes — log a: 
and the description in terms of a log-normal distribution (or in terms of 
any other even function of log x) is independent of the reference currency. 
One often hears the following argument in favour of log-normals: since 



®A log-normal distribution has the remarkable property that the knowledge of 
all its moments is not sufficient to characterise the corresponding distribution. It 
is indeed easy to show that the following distribution: j— a'~-'-e~ 2 [1 + 

\/27r 

a sin(27r log x)], for \a\ < 1, has moments which are independent of the value of 
a, and thus coincide with those of a log-normal distribution, which corresponds to 
a = {[Feller] p. 227). 

^This symmetry is however not always obvious. The dollar, for example, plays 
a special role. This symmetry can only be expected between currencies of similar 
strength. 
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the price of an asset cannot be negative, its statistics cannot be Gaussian 
since the latter admits in principle negative values, while a log-normal 
excludes them by construction. This is however a red-herring argument, 
since the description of the fluctuations of the price of a financial asset 
in terms of Gaussian or log-normal statistics is in any case an approx- 
imation which is only be valid in a certain range. As we shall discuss 
at length below, these approximations are totally unadapted to describe 
extreme risks. Furthermore, even if a price drop of more than 100% is 
in principle possible for a Gaussian process]^ the error caused by ne- 
glecting such an event is much smaller than that induced by the use 
of either of these two distributions (Gaussian or log- normal). In order 
to illustrate this point more clearly, consider the probability of observ- 
ing n times 'heads' in a series of N coin tosses, which is exactly equal to 
2~^C^. It is also well known that in the neighbourhood of N/2, 2~^CJ^ 
is very accurately approximated by a Gaussian of variance iV/4; this is 
however not contradictory with the fact that n > by construction! 

Finally, let us note that for moderate volatilities (up to say 20%), the 
two distributions (Gaussian and log-n orm al) look rather alike, specially 
in the 'body' of the distribution (Fig. O). As for the tails, we shall see 
below that Gaussians substantially underestimate their weight, while the 
log- normal predicts that large positive jumps are more frequent than 
large negative jumps. This is at variance with empirical observation: 
the distributions of absolute stock price changes are rather symmetrical; 
if anything, large negative draw-downs are more frequent than large 
positive draw-ups. 



1.3.3 Levy distributions and Paretian tails 

Levy distributions (noted L^{x) below) appear naturally in the context 
of the CLT (see below), because of their stability property under addi- 
tion (a property shared by Gaussians). The tails of Levy distributions 
are however much 'fatter' than those of Gaussians, and are thus useful 
to describe multiscale phenomena (i.e. when both very large and very 
small values of a quantity can commonly be observed ~ such as personal 
income, size of pension funds, amplitude of earthquakes or other natural 
catastrophes, etc.). These distributions were introduced in the fifties and 
sixties by Mandelbrot (following Pareto) to describe personal income and 
the price changes of some financial assets, in particular the price of cotton 
[Mandelbrot]. An important constitutive property of these Levy distri- 
butions is their power-law behaviour for large arguments, often called 

^'^In the rather extreme case of a 20% annual volatiUty and a zero annual return, 
the probability for the price to become negative after a year in a Gaussian description 
is less than one out of three million 
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0.03 




Figure 1.3: Comparison between a Gaussian (thick line) and a log-normal 
(dashed line), with m = Xq = 100 and a equal to 15 and 15% respectively. 
The diflerence between the two curves shows up in the tails. 



'Pareto tails': 

L,,[x) ~ for X ±oo, (1.18) 

where < /i < 2 is a certain exponent (often called a), and two 
constants which we call tail amplitudes, or scale parameters: A± indeed 
gives the order of magnitude of the large (positive or negative) fluctua- 
tions of X. For instance, the probability to draw a number larger than x 
decreases as 'Py{x) = {A+/xY for large positive x. 

One can of course in principle observe Pareto tails with /i. > 2, how- 
ever, those tails do not correspond to the asymptotic behaviour of a Levy 
distribution. 

In full generality. Levy distributions are characterised by an asym- 
metry parameter defined as /3 = {A^ — A^)/ {A^ -\- A^), which measures 
the relative weight of the positive and negative tails. We shall mostly fo- 
cus in the following on the symmetric case /? = 0. The fully asymmetric 
case (/? — 1) is also useful to describe strictly positive random variables, 
such as, for example, the time during which the price of an asset remains 
below a certain value, etc. 



An important consequence of ( 1.14 ) with /i < 2 is that the variance 



of a Levy distribution is formally infinit e: t he probability density does 
not decay fast enough for the integral (^|^) to converge. In the case 
/I < 1, the distribution density decays so slowly that even the mean, or 
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the MAD, fail to exist.^ The scale of the fluctuations, defined by the 
width of the distribution, is always set by ^ = Aj^ — A_. 

There is unfortunately no simple analytical expression for symmetric 
Levy distributions L^{x), except for /i = 1, which corresponds to a 
Cauchy distribution (or 'Lorentzian'): 

^i(^) = 2/2.2 - (1-19) 

However, the characteristic function of a symmetric Levy distribution is 
rather simple, and reads: 

4,(z) -exp(-a,Jz|''), (1.20) 

where a^j, is a certain constant, proportional to the tail parameter A^j^ 
It is thus clear that in the limit /i = 2, one recovers the definition of 
a Gaussian. When /i decreases from 2, the distribution becomes more 
and more sharply peaked around the origin and fatter in its tails, while 
'intermediate' events loose weight (Fig. |l.4|) . These distributions thus 
describe 'intermittent' phenomena, very often small, sometimes gigantic. 



Note finally that Eq. (1.20) does not define a probability distribution 
when fi > 2, because its inverse Fourier transform is not everywhere 
positive. 



In the case [3^0, one would have: 



Lt(z) = exp 



1 + j/3tan(^7r/2) 



z 



z\ 



(m/1). (1.21) 



It is important to notice that while the leading asymptotic term for 



large x is given by Eq. (I.IS), there are sublcading terms which can be 



important for finite x. The full asymptotic series actually reads: 

^'^(^) = E H^^TT^r(l + n^x) sin(7rW2) (1.22) 

n—l 

The presence of the sublcading terms may lead to a bad empirical esti- 
mate of the exponent /i based on a fit of the tail of the distribution. In 
particular, the 'apparent' exponent which describes the function for 
finite X is larger than /i, and decreases towards /x for x ^ 00, but more 
and more slowly as /i gets nearer to the Gaussian value ^ — 2, for which 

'^^The median and the most probable value however still exist. For a symmetric 
Levy distribution, the most probable value defines the so-called 'localisation' param- 
eter m. 

^^¥oT example, when 1 < < 2, = fJ,r(^ — 1) sin(7r/i/2)ap/7r. 
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Figure 1.4: Shape of the symmetric Levy distributions with /i = 0.8, 1.2, 1.6 
and 2 (this last value actually corresponds to a Gaussian). The smaller ^, the 
sharper the 'body' of the distribution, and the fatter the tails, as illustrated 
in the inset. 



the power-law tails no longer exist. Note however that one also often 
observes empirically the opposite behaviour, i.e. an apparent Pareto ex- 
ponent which grows with x. This arises when the Pareto distribution 
( 1.18 ) is only valid in an intermediate regime x <C 1/a, beyond which 
the distribution decays exponentially, say as exp(— ax). The Pareto tail 
is then 'truncated' for large values of x, and this leads to an effective fi 
which grows with x. 

An interesting generalisation of the Levy distributions which ac- 
counts for this exponential cut-off is given by the 'truncated Levy distri- 
butions' (tld), which will be of much use in the following. A simple way 
to alter the characteristic function (|1.20 ) to account for an exponential 
cut-off for large arguments is to set:[^| 



exp 



(a^ +z^)2 cos (/iarctan(|z|/Q;)) — 



cos(7r^/2) 



(1.23) 



for 1 < < 2. The above form reduces to (1.20) for a — Q. Note that 



^^See I. Koponen, "Analytic approach to the problem of convergence to truncated 
Levy flights towards the Gaussian stochastic process," Physical Review E, 52, 1197, 
(1995). 
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the argument in the exponential can also be written as: 

^^^^[(.+..r + (.-..r-2.^]. (1.24) 

Exponential tail: a limiting case 

Very often in the following, we shall notice that in the formal limit 
jj, — > oo, the power-law tail becomes an exponential tail, if the tail 
parameter is simultaneously scaled as = (/i/a)''. Qualitatively, 
this can be understood as follows: consider a probability distribution 
restricted to positive x, which decays as a power-law for large x, defined 
as: 

- J^.- (1-25) 



This shape is obviously compatible with and is such thatV>{x = 

0) = 1. If A = [n/a), one then finds: 

^>(^) = — ^^lYT^ — ' exp{-ax). (1.26) 

1.3.4 Other distributions (*) 

There are obviously a very large number of other statistical distributions 
useful to describe random phenomena. Let us cite a few, which often 
appear in a financial context: 

• The discrete Poisson distribution: consider a set of points ran- 
domly scattered on the real axis, with a certain density lo (e.g. the 
times when the price of an asset changes). The number of points n 
in an arbitrary interval of length ^ is distributed according to the 
Poisson distribution: 

P{n) = ^-i- exp(-cj£). (1.27) 

• The hyperbolic distribution, which interpolates between a Gaus- 
sian 'body' and exponential tails: 

1 



^^(^) = ^r^F7 vexp-[aJx2+x2], (1.28) 

where the normalisation Ki{axo) is a modified Bessel function of 
the second kind. For x small compared to .tq, Ph{x) behaves as a 
Gaussian while its asymptotic behaviour for x 3> 2^0 is fatter and 
reads exp — a|a;|. 
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Figure 1.5: Probability density for the truncated Levy (/i = 3/2), Student and 
hyperbolic distributions. All three have two free parameters which were fixed 
to have unit variance and kurtosis. The inset shows a bfow-up of the tails 
where one can see that the Student distribution has tails similar (but slightly 
thicker) to that of the truncated Levy. 



From the characteristic function 



p (^^ _ aa:oji:i(a:oVl + Q:^) 

^H(z) - —— , , (l-^9) 

Ai(aa;o)v 1 + cez 



we can compute the variance 



^2 ^ XoK2iaxo) 
aKi{axo) ' 



and kurtosis 



V A 1 {axo ) J axo Ki [axo ) 

Note that the kurtosis of the hyperbolic distribution is always be- 
tween zero and three. In the case xq = 0, one finds the symmetric 
exponential distribution: 

PEix) = ^cxp-a\x\, (1.32) 

with even moments m2„ = 2n!a~^", which gives = 2a~^ and 
K = 3. Its characteristic function reads: Pe{z) = o? fip? + 2:^). 
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The Student distribution, which also has power-law tails: 

p , ^ 1 r((i + ;^)/2) 

which coincides with the Cauchy distribution for fj, — 1, and tends 
towards a Gaussian in the limit /i — > oo, provided that a^ is scaled 
as fi. The even moments of the Student distribution read: m2„ = 
(2n - l)!!r(^/2 - n)/r(/i/2) provided 2n < fi; and are 

infinite otherwise. One can check that in the limit fi oo, the 
above expression give s ba ck the moments of a Gaussian: m2n = 



(2n — 1)!! a . Figure 1^ shows a plot of the Student distribution 



with K = 1, corresponding to ii — 10. 



1.4 Maximum of random variables — Statistics of 

extremes 

If one observes a series of N independent realizations of the same random 
phenomenon, a question which naturally arises, in particular when one 
is concerned about risk control, is to determine the order of magnitude 
of the maximum observed value of the random variable (which can be 
the price drop of a financial asset, or the water level of a flooding river, 
etc.). For example, in Chapter 3, the so-called 'Value-at-Risk' (VaR) on 
a typical time horizon will be defined as the possible maximum loss over 
that period (within a certain confidence level). 

The law of large numbers tells us that an event which has a prob- 
ability p of occurrence appears on average Np times on a series of N 
observations. One thus expects to observe events which have a prob- 
ability of at least 1/A^. It would be surprising to encounter an event 
which has a probability much smaller than 1/N. The order of magni- 
tude of the largest event observed in a series of A'' independent identically 
distributed (iid) random variables is thus given by: 



^>(A„,ax) - l/N. (1.34) 

More precisely, the full probability distribution of the maximum value 
Xmax = maxi^i^Arjxi}, is relatively easy to characterise; this will justify 
the above simple criterion (1.34). The cumulative distribution ^(xinax < 
A) is obtained by noticing that if the maximum of all x^'s is smaller than 
A, all of the Xi^s must be smaller than A. If the random variables are 
IID, one finds: 

V{x„,^^<A)^[P<iA)f. (1.35) 
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Note that this resuh is general, and does not rely on a specific choice for 
P{x). When A is large, it is useful to use the following approximation: 

P(W < A) = [1 - V>{K)f e-^^>(^). (1.36) 

Since we now have a simple formula for the distribution of Xmax, one 
can invert it in order to obtain, for example, the median value of the 
maximum, noted Amcdi such that 7^(a;max < A,nod) = 1/2: 

^>(A.ed) = l-(i)^/"-i^|^. (1.37) 

More generally, the value Ap which is greater than Xmax with probability 
p is given by 

7'>(Ap)c.-i^. (1.38) 



The quantity Amax defined by Eq. (1.34) above is thus such that p = 
1/e ~ 0.37. The probability that Xmax is even larger than Amax is thus 
63%. As we shall now show, Amax also corresponds, in many cases, to 
the most probable value of Xmax- 



Equation (1.38) will be very useful in Chapter 3 to estimate a max- 
imal potential loss within a certain confidence level. For example, the 
largest daily loss A expected next year, with 95% confidence, is defined 
such that 7'<(— A) — — log(0.95)/250, where is the cumulative dis- 
tribution of daily price changes, and 250 is the number of market days 
per year. 

Interestingly, the distribution of Xmax only depends, when N is large, 
on the asymptotic behaviour of the distribution of x, P{x), when x ^ oo. 
For example, if P{x) behaves as an exponential when a; — > oo, or more 
precisely if Vy{x) ~ exp(— aa;), one finds: 



logiV 



(1.39) 



which grows very slowly with Setting Xmax = A,„ax + ^, one finds 

that the deviation u around A^ax is distributed according to the Gumbel 
distribution: 

P(u) = e-*=""e-". (1.40) 

The most probable value of this distribution is u = 0. This shows that 
Amax is the most probable value of Xmax- The result ( 1.40| ) is actually 



much more general, and is valid as soon as P{x) decreases more rapidly 

^^For example, for a symmetric exponential distribution P{x) = exp(— |a;|)/2, the 
median value of the maximum of N = 10 000 variables is only 6.3. 
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o n=3 

o Exp 

- Slope -1/3 

- Effective slope -1/5 




Figure 1.6: Amplitude versus rank plots. One plots the value of the n"* 
variable A[n] as a function of its rank n. If P{x) behaves asymptotically as 
a power-law, one obtains a straight line in log-log coordinates, with a slope 
equal to — 1/^i. For an exponential distribution, one observes an effective slope 
which is smaller and smaller as N/n tends to infinity. The points correspond 
to synthetic time series of length 5 000, drawn according to a power law with 
fi — 3, or according to an exponenti al. N ote that if the axis x and y are 
interchanged, then according to Eq. 



1.45), one obtains an estimate of the 



cumulative distribution, Vy 



tha n an y power-law for a; — > oo: the deviation between A^ax (defined 



as (|l.34| )) and Xmax is always distributed according to the Gumbel law 



( |l.40D , up to a scaling factor in the definition of u. 



The situation is radically different if P{x) decreases as a power law, 



cf. ( 1.14 ). In this case, 

Vyix)^-t, (1.41) 
and the typical value of the maximum is given by: 

A„,ax = ^+^^. (1.42) 

Numerically, for a distribution with ^ — 3/2 and a scale factor = 1, 
the largest of = 10 000 variables is on the order of 450, while for 
/i = 1/2 it is one himdred million! The complete distribution of the 
maximum, called the Frechet distribution, is given by: 

F(.) = ^e-V«" u^^^. (1.43) 
u^+t" A+N- 
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Its asymptotic behaviour for u ^ oo is still a power law of exponent 

I + /U. Said differently, both power-law tails and exponential tails are 
stable with respect to the 'max' operation^^ The most probable value 
Xmax is now equal to -l- /i)^^''Amax- As mentioned above, the limit 

II oo formally corresponds to an exponential distribution. In this 
limit, one indeed recovers A^ax as the most probable value. 



Equation (1.4i) allows to discuss intuitively the divergence of the 
mean value for fJ, < 1 and of the variance for fJ. < 2. If the mean 
value exists, the sum of N random variables is typically equal to Nm, 
where m is the mean (see also below). But when /j. < 1, the largest 
encountered value of X is on the order of N^^'^ 3> A'', and would thus 
be larger than the entire sum. Similarly, as discussed below, when the 
variance exists, the RMS of the sum is equal to a^/N . But for ^ < 2, 
a;max grows faster than \/7V. 

More generally, one can rank the random variables Xi in decreasing 



order, and ask for an estimate of the n encountered value, noted A[' 
below. (In particular, A[l] = Xmax)- The distribution P„ of A[7i] can be 
obtained in full generality as: 

P„(A[n]) = C]^ P{x ^ k[n]) {V{x > A[n])''-\V{x < A[n])^-". (1.44) 

The previous expression means that one has to choose n variables among 
A'' as the n largest ones, and then assign the corresponding probabilities 
to the configuration where n — 1 of them are larger than A[n] and N — n 
are smaller than A[n\. One can study the position A*[n] of the maximum 
of Pn, and also the width of P„, defined from the second derivative of 
logP„ calculated at A*[n]. The calculation simplifies in the limit where 
N — > oo, n — > cxD, with the ratio n/N fixed. In this limit, one finds a 



relation which generalises (1.34) 



V>{A*[n]) ^n/N. (1.45) 
The width i/7„ of the distribution is foimd to be given by: 



1 Jl - [nlNY 

^ ^ ' ' (1.46) 



N P{x = A*[n]) ' 



which shows that in the limit iV ^ oo, the value of the n"* variable is 
more and more sharply peaked around its most probable value A*[n], 



given by (1.45) 



^^A third class of laws stable under 'max' concerns random variables which are 
bounded from above - i.e. such that P{x) = for x > xj^j, with xm finite. This leads 
to the WeibuU distributions, which we will not consider further in this book. 
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In the case of an exponential tail, one finds that A[n]* ~ log(-^)/a; 
while in the case of power-law tails, one rather obtains: 

A*[n]^A+(^^y . (1.47) 

This last equation shows that, for power-law variables, the encountered 
values are hierarchically organised: for example, the ratio of the largest 
value Xinax = A[l] to the second largest A[2] is on the order of 2^/^, 
which becomes larger and larger as /i decreases, and conversely tends to 
one when — > oo. 



The property (1.47) is very useful to identify empirically the nature 
of the tails of a probability distribution. One sorts in decreasing order 
the set of observed values {xi,X2, --^xn} and one simply draws A[n] as 
a function of n. If the variables are power-law distributed, this graph 
should be a straight line in log-log plot, with a slope — l//ii, as given by 



(1.47) (Fig. 1.6). On the same figure, we have shown the result obtained 
for exponentially distributed variables. On this diagram, one observes 
an approximately straight line, but with an effective slope which varies 
with the total number of points TV: the slope is less and less as N/n 
grows larger. In this sense, the formal remark made above, that an 
exponential distribution could be seen as a power law with /i — s- oo, 
becomes somewhat more concrete. Note that if the axis x and y of 



Fig. L6 are interchanged, then according to Eq. ( 1.45|) , one obtains an 



estimate of the cumulative distribution, V 



>■ 



Let us Gnally note another property of power laws, potentially in- 
teresting for their empirical determination. If one computes the average 
value of X conditioned to a certain minimum value A: 

r,°° dx X P(x) 

^^^^- JrdxPix) ' ^'-''^ 

then, if P{x) decreases as in ( |j.i4 ), one finds, for A oo, 

{x)a = -J^A, (1.49) 

independently of the tail amplitude A'^^^ The average {x)a is thus 
always of order of A itself, with a proportionality factor which diverges 
as fi ^ 1. 



^®This means that /i can be determined by a one parameter fit only. 
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1.5 Sums of random variables 



In order to describe the statistics of future prices of a financial asset, one 
a priori needs a distribution density for all possible time intervals, corre- 
sponding to different trading time horizons. For example, the distribu- 
tion of five minutes price fluctuations is different from the one describing 
daily fluctuations, itself different for the weekly, monthly, etc. variations. 
But in the case where the fluctuations are independent and identically 
distributed (iid) - an assumption which is however not always justified, 
see 1.7 and 2.4, it is possible to reconstruct the distributions correspond- 



ing to different time scales from the knowledge of that describing short 
time scales only. In this context, Gaussians and Levy distributions play 
a special role, because they are stable: if the short time scale distribu- 
tion is a stable law, then the fluctuations on all time scales are described 
by the same stable law - only the parameters of the stable law must 
be changed (in particular its width). More generally, if one sums IID 
variables, then, independently of the short time distribution, the law de- 
scribing long times converges towards one of the stable laws: this is the 
content of the 'central limit theorem' (clt). In practice, however, this 
convergence can be very slow and thus of limited interest, in particular 
if one is concerned about short time scales. 



1.5.1 Convolutions 

What is the distribution of the sum of two independent random vari- 
able? This sum can for example represent the variation of price of an 
asset between today and the day after tomorrow (X), which is the sum 
of the increment between today and tomorrow {Xi) and between tomor- 
row and the day after tomorrow {X2), both assumed to be random and 
independent. 

Let us thus consider X — Xi+ X2 where Xi and X2 are two random 
variables, independent, and distributed according to Pi(xi) and P2{x2), 
respectively. The probability that X is equal to x (within dx) is given by 
the sum over all possibilities of obtaining X — x (that is all combinations 
of Xi — xi and X2 — X2 such that xi + X2 = x), weighted by their 
respective probabilities. The variables Xi and X2 being independent, the 
joint probability that Xi — xi and X2 — x — xi is equal to Pi{xi)P2{x — 
xi), from which one obtains: 

P(a;,iV = 2) = j dx'Pi{x')P2{x-x'). (1.50) 

This equation defines the convolution between Pi (x) and P2 {x) , which we 
shall write P — Pi *P2- The generalisation to the sum of N independent 
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random variables is immediate. If X = Xi + X2 + ... + X^ with Xi 
distributed according to Pi{xi), the distribution of X is obtained as: 

/JV-l 
Jl dx[Pi{x[)...PN-iix'r^-i)PN{x - x[ - ... - x'n_i). (1.51) 

i — 1 

One thus understands how powerful is the hypothesis that the incre- 
ments are IID, i.e., that Pi = P2 = .. = Pn- Indeed, according to this 
hypothesis, one only needs to know the distribution of increments over 
a unit time interval to reconstruct that of increments over an interval of 
length N: it is simply obtained by convoluting the elementary distribu- 
tion TV times with itself. 



The analytical or numerical manipulations of Eqs. {1.50} and (1.51) 
are much eased by the use of Fourier transforms, for which convolutions 
become simple products. The equation P(x,N = 2) = [P\ ★P2](2:), 
reads in Fourier space: 



P(z,N = 2) 



j da;e"("-"'+"') j dx Pi{x)P2{x - x) = P^{z)P2{z). 

(1.52) 

In order to obtain the N^'^ convolution of a function with itself, one 
should raise its characteristic function to the power N, and then take 
its inverse Fourier transform. 



1.5.2 Additivity of cumulants and of tail amplitudes 

It is clear that the mean of the sum of two random variables (indepen- 
dent or not) is equal to the sum of the individual means. The mean is 
thus additive under convolution. Similarly, if the random variables are 
independent, one can show that their variances (if they are well defined) 
also add simply. More generally, all the cumulants (c„) of two indepen- 
dent distributions simply add. This follows from the fact that since the 
characteristic functions multiply, their logarithm add. The additivity of 
cumulants is then a simple consequence of the linearity of derivation. 

The cumulants of a given law convoluted N times with itself thus 
follow the simple rule c„ jv = Nc„,i where the {c„.i} are the cumulants of 
the elementary distribution Pi . Since the cumulant c„ has the dimension 
of X to the power n, its relative importance is best measured in terms 
of the normalised cumulants: 

The normalised cumulants thus decay with N for n > 2; the higher the 
cumulant, the faster the decay: oc N^^^^/'^. The kurtosis k, defined 
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above as the fourth normaUsed cumulant, thus decreases as 1/N. This 
is basicahy the content of the CLT: when N is very large, the cumulants 
of order > 2 become neghgible. Therefore, the distribution of the sum 
is only characterised by its first two cumulants (mean and variance): it 
is a Gaussian. 

Let us now turn to the case where the elementary distribution Pi[xi) 



decreases as a power law for large arguments xi (cf. (]1.14)), with a 



certain exponent /i. The cumulants of order higher than /i are thus 
divergent. By studying the small z singular expansion of the Fourier 
transform of P{x,N), one finds that the above additivity property of 
cumulants is bequeathed to the tail amplitudes A^.: the asymptotic 
behaviour of the distribution of the sum P{x, N) still behaves as a power- 
law (which is thus conserved by addition for all values of fi, p rovid ed one 



takes the limit a; — > oo before N oo - see the discussion in 1.6.3 ), with 
a tail amplitude given by: 

AIj, = NA^. (1.54) 

The tail parameter thus play the role, for power-law variables, of a gen- 
eralised cumulant. 



1.5.3 Stable distributions and self- similarity 

If one adds random variables distributed according to an arbitrary law 
Pi{xi), one constructs a random variable which has, in general, a dif- 
ferent probability distribution {P{x,N) = [Pi{xi)]*^). However, for 
certain special distributions, the law of the sum has exactly the same 
shape as the elementary distribution - these are called stable laws. The 
fact that two distributions have the 'same shape' means that one can 
find a {N dependent) translation and dilation of a; such that the two 
laws coincide: 

P{x, N)dx — Pi{xi)dxi where X — aNXi -\- bjs[ . (1.55) 

The distribution of increments on a certain time scale (week, month, 
year) is thus scale invariant, provided the variable X is properly rescaled. 
In this case, the chart giving the evolution of the price of a financial asset 
as a function of time has the same statistical structure, independently 
of the chosen elementary time scale - only the average slope and the 
amplitude of the fluctuations are different. These charts are then called 
self-similar, or, using a better terminology introduced by Mandelbrot, 



self-affine (Figs. 1.7 and 1 
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Figure 1.7: Example of a self-afiine function, obtained by summing random 
variables. One plots the sum a; as a function of the number of terms N in 

the sum, for a Gaussian elementary distribution Pi(xi). Several successive 
'zooms' reveal the self similar nature of the function, here with ojv = N^^^. 
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The family of all possible stable laws coincide (for continuous vari- 
ables) with the Levy distributions defined above which include Gaus- 
sians as the special case ^ — 2. This is easily seen in Fourier space, 
using the explicit shape of the characteristic function of the Levy distri- 
butions. We shall specialise here for simplicity to the case of symmetric 
distributions Pi{xi) = Pi{—xi), for which the translation factor is zero 
(&JV = 0). The scale parameter is then given by a^r = A^~j^ and one 
finds, for < 2: 

{\x\'^)^ (X ant: q<^ (1.56) 

where A — A^ = A_ . In words, the above equation means that the order 
of magnitude of the fluctuations on 'time' scale TV is a factor A^'' larger 
than the fluctuations on the elementary time scale. However, once this 
factor is taken into account, the probability distributions are identical. 
One should notice the smaller the value of /i, the faster the growth of 
fluctuations with time. 



1.6 Central limit theorem 

We have thus seen that the stable laws (Gaussian and Levy distributions) 
are 'fixed points' of the convolution operation. These fixed points are 
actually also attractors, in the sense that any distribution convoluted 
with itself a large number of times finally converges towards a stable 
law (apart from some very pathological cases) . Said differently, the limit 
distribution of the sum of a large number of random variables is a stable 
law. The precise formulation of this result is known as the central limit 
theorem (clt). 



1.6.1 Convergence to a Gaussian 

The classical formulation of the CLT deals with sums of IID random 
variables of finite variance towards a Gaussian. In a more precise 
way, the result is then the following: 

lim V[ui< — < U2 = / '\ 1-57 



for all finite ui,U2- Note however that for finite TV, the distribution of the 
sum X = Xi + ... + Xn in the tails (corresponding to extreme events) 
can be very different from the Gaussian prediction; but the weight of 
these non-Gaussian regions tends to zero when iV goes to infinity. The 

'^^For discrete variables, one should also add the Poisson distribution (1.27). 
^^The case fi = I is special and involves extra logarithmic factors. 
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5000 



Figure 1.8: In this case, the elementary distribution Pi(xi) decreases as a 
power-law with an exponent ji = 1.5. The scale factor is now given by on = 
iV^/^. Note that, contrarily to the previous graph, one clearly observes the 
presence of sudden 'jumps', which reflect the existence of very large values of 
the elementary increment xi. 
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CLT only concerns the central region, which keeps a finite weight for N 
large: we shall come back in detail to this point below. 

The main hypotheses insuring the validity of the Gaussian CLT are 
the following: 

• The Xi must be independent random variables, or at least not 
'too' correlated (the correlation function ( must decay 
sufficiently fast when \i — j\ becomes large - see 1.7.1 below). For 
example, in the extreme case where all the Xi are perfectly cor- 
related (i.e. they are all equal), the distribution of X is obviously 
the same as that of the individual X.^ (once the factor N has been 
properly taken into account). 

• The random variables Xi need not necessarily be identically dis- 
tributed. One must however require that the variance of all these 
distributions are not too dissimilar, so that no one of the vari- 
ances dominates over all the others (as would be the case, for 
example, if the variances were themselves distributed as a power- 
law with an exponent fi < 1). In this case, the variance of the 
Gaussian limit distribution is the average of the individual vari- 
ances. This also allows one to deal with sums of the type X = 
PiXi +P2X2 + ... +pnXn, where the pi are arbitrary coefficients; 
this case is relevant in many circumstances, in particular in the 
Portfolio theory (cf. Chapter 3). 

• Formally, the CLT only applies in the limit where N is infinite. 
In practice, N must be large enough for a Gaussian to be a good 
approximation of the distribution of the sum. The minimum re- 
quired value of A'' (called N* below) depends on the elementary 
distribution Pi{xi) and its distance from a Gaussian. Also, N* 
depends on how far in the tails one requires a Gaussian to be a 
good approximation, which takes us to the next point. 

• As mentioned above, the CLT does not tell us anything about the 
tails of the distribution of X; only the central part of the distribu- 
tion is well described by a Gaussian. The 'central' region means 
a region of width at least on the order of ^/N a around the mean 
value of X. The actual width of the region where the Gaussian 
turns out to be a good approximation for large finite N crucially 
depends on the element ary d istribution Pi(xi). This problem will 



be explored in Section 1.6.3 . Roughly speaking, this region is of 
width - N^/^cr for ' narrow' symmetric elementary distributions, 
such that all even moments are finite. This region is however some- 
times of much smaller extension: for example, if Pi(a;i) has power- 
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law tails with fi > 2 (such that a is finite), the Gaussian 'realm' 



grows barely faster than vN (as ~ ^/N logiV). 

The above formulation of the CUT requires the existence of a finite 
variance. This condition can be somewhat weakened to include some 
'marginal' distributions such as a power-law with p = 2. In this case 



the scale factor is not un = vN but rather ajv = VjVTniV. However, 
as we shall discuss in the next section, elementary distributions which 
decay more slowly than \x\~^ do not belong the the Gaussian basin of 
attraction. More precisely, the necessary and sufScient condition for 
Pi{xi) to belong to this basin is that: 

limn^r^<(-")+^^>(-) ^0. (1.58) 
u^co I du' u''^P\(u') 

This condition is always satisfied if the variance is finite, but allows one 
to include the marginal cases such as a power-law with jj. — 2. 



The central limit theorem and information theory 



It is interesting to notice that the Gaussian is the law of maximum 
entropy - or minimum information - such that its variance is fixed. The 
missing information quantity J (or entropy) associated with a proba- 
bility distribution P is defined as: 



I[P] = - dx P{x) log 



P{x) 



(1.59) 



The distribution maximising T[P] for a given value of the variance is 
obtained by taking a functional derivative with respect to P{x): 



d 



dP{x) 



I[P] -C j dx x^P{x) -C' j dx P(x') 



= 0, 



(1.60) 



where is fixed by the condition J dx x^P{x) = and ^ by the 
normalisation of P{x). It is immediate to show that the solution to 
(1.6C) is indeed the Gaussian. The numerical value of its entropy is: 



= I + i log(27r) + log(a) 



2.419 + log(cr) 



(1.61) 



For comparison, one can compute the entropy of the symmetric expo- 
nential distribution, which is: 



2:B = 2+i^+log(a) 



2.346 + log(cr) 



(1.62) 



It is important to understand that the convolution operation is 'in- 
formation burning', since all the details of the elementary distribution 
Pi{xi) progressively disappear while the Gaussian distribution emerges. 
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1.6.2 Convergence to a Levy distribution 

Let us now turn to the case of the sum of a large number N of IID 
random variables, asymptotically distributed as a power- law with /x < 2, 



and with a tail amplitude = ~ At (cf. ( 1.14 )). The variance of 
the distribution is thus infinite. The limit distribution for large N is then 
a stable Levy distribution of exponent fi and with a tail amplitude N . 
If the positive and negative tails of the elementary distribution Pi{xi) 
are characterised by different amplitudes {At^ and A'^) one then obtains 
an asymmetric Levy distribution with parameter (3 — {A^ — At)l{A^ + 
At). If the 'left' exponent is different from the 'right' exponent ^ 
fj-+), then the smallest of the two wins and one finally obtains a totally 
asymmetric Levy distribution (/3 = — 1 or /3 = 1) with exponent ^ = 
min(/j,_, /i-).). The CLT generalised to Levy distributions applies with 
the same precautions as in the Gaussian case above. 

Technically, a distribution Pi{xi) belongs to the attraction basin of 
the Levy distribution I/fi,/3 if and only if: 

lim = '-4; (1.63) 

u^oo Pl>{u) l+P ^ ^ 



and for all r, 



lim ,^l£(z4±^l4!^^,M. (1.64) 



A distribution with an asymptotic tail given by ) is such that, 



A^ 

- ^andPi>(M) ~ (1.65) 

and thus belongs to the attraction basin of the Levy distribution of 
exponent /i and asymmetry parameter /3 = {A^ ~ At ) / {A^ + At ) . 

1.6.3 Large deviations 

The CLT teaches us that the Gaussian approximation is justified to de- 
scribe the 'central' part of the distribution of the sum of a large number 
of random variables (of finite variance). However, the definition of the 
centre has remained rather vague up to now. The CLT only states that 
the probability of finding an event in the tails goes to zero for large N. 
In the present section, we characterise more precisely the region where 
the Gaussian approximation is valid. 

If X is the sum of N IID random variables of mean m and variance 
(T^, one defines a 'rescaled variable' U as: 

X~Nm 
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which according to the CLT tends towards a Gaussian variable of zero 
mean and unit variance. Hence, for any fixed u, one has: 

lim V>{u)^Vg>{u), (1.67) 

N^oo 

where Vg>{'^) is the related to the error function, and describes the 
weight contained in the tails of the Gaussian: 

^cxp(-z.V2) = ierfc(^^j. (1.68) 

However, the above convergence is not uniform. The value of N such 
that the approximation V>{u) ~ Voiu) becomes valid depends on u. 
Conversely, for fixed N, this approximation is only valid for u not too 
large: |m| <C mo(-^)- 

One can estimate uo{N) in the case where the elementary distribution 
Pi(xi) is 'narrow', that is, decreasing faster than any power-law when 
|a;i| — > c», such that all the moments are finite. In this case, all the 
cumulants of Pi are finite and one can obtain a systematic expansion in 
powers of N~^^'^ of the difference AVy{u) = Vy{u) — Voiu), 

A-n ^ ^ exp(-'uV2) f Qi{u) Q2{u) Qk{u) 



(1.69) 

where the Qk{u) are polynomials functions which can be expressed in 



terms of the normalised cumulants A„ (cf. ( 1.12 )) of the elementary dis 



tribution. More explicitely, the first two terms are given by: 

Qi{u)^\X^{u^-l), (1.70) 



and 



One recovers the fact that if all the cumulants of Pi{xi) of order 
larger than two are zero, all the Qk are also identically zero and so is 
the difference between P{x, N) and the Gaussian. 

For a general asymmetric elementary distribution Pi, A3 is non zero. 
The leading term in the above expansion when N is large is thus Qi{u). 
For the Gaussian approximation to be meaningful, one must at least 
require that this term is small in the central region where u is of order 
one, which corresponds to x — mN ~ a^/N. This thus imposes that 
N ^ N* — A|. The Gaussian approximation remains valid whenever 
the relative error is small compared to 1. For large u (which will be jus- 
tified for large N), the relative error is obtained by dividing Eq. ( 1.69| ) 
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by Vg>{u) — exp(— u^/2)/(7i\/27r). One then obtains the foUowing con- 
ciition:0 

/ N \ ^^^ 

Agw^ <iVi/2 i e. |x- A/-m| < (T\/iV f — j . (1.72) 

This shows that the central region has an extension growing as A^3. 

A symmetric elementary distribution is such that A3 = 0; it is then 
the kurtosis k = \i that fixes the first correction to the Gaussian when 
N is large, and thus the extension of the central region. The conditions 
now read: N N* = Xa, and 



N 



1/4 



X^u^ <^N i.e. \x- Nm\<^(T^/N y—j . (1.73) 

The central region now extends over a region of width iV3/4^ 

The results of the present section do not directly apply if the elemen- 
tary distribution Pi{xi) decreases as a power-law ('broad distribution'). 
In this case, some of the cumulants are infinite and the above cumulant 
expansion ( |1.69 ) is meaningless. In the next section, we shall see that 



in this case the 'central' region is much more restricted than in the case 
of 'narrow' distributions. We shall then describe in Section 1.6.5| , the 



case of 'truncated' power-law distributions, where the above conditions 
become asymptotically relevant. These laws however may have a very 
large kurtosis, which depends on the point where the truncation becomes 
noticeable, and the above condition iV ^ A4 can be hard to satisfy. 

Cramer function 

More generally, when N is large, one can write the distribution of 
the sum of N IID random variables asf^ 



P{x,N) ~ exp 



iV- 



-NS 



(1.74) 



where S is the so-called Cramer function, which gives some information 
about the probability of X even outside the 'central' region. When the 
variance is finite, S grows as S{u) oc for small u's, which again leads 
to a Gaussian central region. For finite u, S can be computed using 
Laplace's saddle point method, valid for N large. By definition: 

P{x,N)^ /"^exp7v(-i^^+log[A(^)]) . (1.75) 



^®The above arguments can actually be made fully rigourous, see [Feller]. 
^''We assume that their mean is zero, which can always be achieved through a 
suitable shift of xi. 
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When N is large, the above integral is dominated by the neighbourhood 
of the point z* where the term in the exponential is stationary. The 
results can be written as: 



P{x,N) ~ exp 



-NS 



(1.76) 



with S{u) given by: 
d\og[Piiz)] 



dz 



S{u) 



-iz*u + log[Pi{z*)], (1.77) 



which, in principle, allows one to estimate P{x, N) even outside the 
central region. Note that if S{u) is Enite for finite u, the corresponding 
probability is exponentially small in N. 



1.6.4 The CLT at work on a simple case 

It is helpful to give some flesh to the above general statements, by work- 
ing out explicitly the convergence towards the Gaussian in two exactly 
soluble cases. On these examples, one clearly sees the domain of validity 
of the CLT as well as its limitations. 

Let us first study the case of positive random variables distributed 
according to the exponential distribution: 

Pi(x) = e(xi)ae-"^i, (1.78) 

where 6(a;i) is the function equal to 1 for xi > and to otherwise. 
A simple computation shows that the above distribution is correctly 
normalised, has a mean given by m = and a variance given by 

~ a~'^ . Furthermore, the exponential distribution is asymmetrical; 
its skewness is given by C3 = {{x — m)^) = 2a~^, or A3 — 2. 

The sum of N such variables is distributed according to the N*'^ 
convolution of the exponential distribution. According to the CLT this 
distribution should approach a Gaussian of mean mN and of variance 
Na"^. The N*^ convolution of the exponential distribution can be com- 
puted exactly. The result isj^ 

N—l —ax 

P(x,7V) = e(x)a^^^^-^, (1.79) 

which is called a 'Gamma' distribution of index N. At first sight, this 
distribution does not look very much like a Gaussian! For example, its 
asymptotic behaviour is very far from that of a Gaussian: the 'left' side 



This result can be shown by induction using the definition (1.50) 
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is strictly zero for negative x, while the 'right' tail is exponential, and 
thus much fatter than the Gaussian. It is thus very clear that the CLT 
does not apply for values of x too far from the mean value. However, 
the central region around Nm — Na~^ is well described by a Gaussian. 
The most probable value (x*) is defined as: 



ax 



= 0, (1.80) 



or X* = (N — l)m. An expansion m x — x* of P{x, N) then gives us: 
log P{x,N) = _/^(7V_i)_iog™-^!||Z^ 



a^{x — x*)^ 
3(iV- 1)2 



+ 0{x-x*y, (1.81) 



where 



K{N) = \ogN\+N - NlogN ~ ilog(27riV). (1.82) 

Hence, to second order in x~x* , P{x, N) is given by a Gaussian of mean 
{N— l)m and variance (iV — l)a^. The relative difference between N and 
iV — 1 goes to zero for large N. Hence, for the Gaussian approximation to 
be valid, one requires not only that N be large compared to one, but also 
that the higher order terms in {x—x*) be negligible. The cubic correction 
is small compared to 1 as long as a|a; — x*| <C N^^'^, in agreement with 



the above general statement (1.72) for an elementary distribution with 
a non zero third cumulant. Note also that for x —^ oo, the exponential 
behaviour of the Gamma function coincides (up to subleading terms 
in x^~^) with the asymptotic behaviour of the elementary distribution 
Pi(xi). 

Another very instructive example is provided by a distribution which 
behaves as a power-law for large arguments, but at the same time has a 
finite variance to ensure the validity of the CLT. Consider the following 
explicit example of a Student distribution with /x = 3: 

A(xr) = (1.83) 

where a is a positive constan t. Th is symmetric distribution behaves as 
a power- law with ^ = 3 (cf. ( |l.l4| )); all its cumulants of order larger or 
equal to three are infinite. However, its variance is finite and equal to 
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It is useful to compute the characteristic function of this distribu- 
tion, 

A(z) = (l+a|z|)e""l"l, (1.84) 
and the first terms of its small z expansion, which read: 

A(^)^l-^ + ^^ + 0(z*). (1.85) 

The first singular term in this expansion is thus \z\''^ , as expected from 
the asymptotic behaviour of Pi{xi) in a;^*, and the divergence of the 
moments of order larger than three. 

The N*^ convolution of P-i_{x\) thus has the following characteristic 
function: 

A'^(2) = (l + a|z|)^e-"^l^i, (1.86) 
which, expanded around z = Q, gives: 

A\k)^l-^ + ^ + Oi.% (1.87) 

JVote that the \z\'^ singularity (which signals the divergence of the mo- 
ments mn for n > 3) does not disappear under convolution, even if at 
the same time P(x, N) converges towards the Gaussian. The resolu- 
tion of this apparent paradox is again that the convergence towards the 
Gaussian only concerns the centre of the distributi on, w hile the tail in 
survives for ever (as was mentioned in Section 1.5.c ). 

As follows from the CLT, the centre of P(a;, TV) is well approximated, 
for TV large, by a Gaussian of zero mean and variance No?". 

On the other hand, since the power-law behaviour is conserved upon 
addition and that the tail amplitudes simply add (cf. ( 1.14 )), one also 
has, for large x's: 

P{x,N) (1.89) 

x-^oa Tlx 

The above two expressions ( p_.8§| ) and ( 1.89| ) are not incompatible, since 
these describe two very different regions of the distribution P{x, N). For 
fixed N , there is a characteristic value xq{N) beyond which the Gaussian 
approximation for P(a;, N) is no longer accurate, and the distribution is 
described by its asymptotic power-law regime. The order of magnitude 
of xo{N) is fixed by looking at the point where the two regimes match 
to one another: 

1 f \ 2Na^ 
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One thus find, 



Xo{N) ~ a^NXogN, (1.91) 
(neglecting subleading corrections for large N). 



This means that the rescaled variable U = X/{a\/N) becomes for 
large N a Gaussian variable of unit variance, but this description ceases 
to be valid as soon as u ~ \/IoglV, which grows very slowly with TV. For 
example, for N equal to a million, the Gaussian approximation is only 
acceptable for fluctuations of u of less than three or four rms! 

Finally, the CLT states that the weight of the regions where P(x, N) 
substantially differs from the Gaussian goes to zero when N becomes 
large. For our example, one finds that the probability that X falls in the 
tail region rather than in the central region is given by: 

r°° 2a^N 1 
V<ixo)+V>{xo)^2 -dxcx— -— , (1.92) 

Ja^WT^ T^X^ ViVlog3/2iV 

which indeed goes to zero for large N . 

The above arguments are not special to the case /i = 3 and in fact 
apply more generally, as long as /x > 2, i.e. when the variance is finite. 
In the general case, one finds that the CLT is valid in the region \x\ ^ 
xq oc v'lVToglV', and that the weight of the non Gaussian tails is given 
by: 

V<{x^)+Vy{x^)^ \ (1.93) 

log'^^ N 

which tends to zero for large N . However, one should notice that as 
approaches the 'dangerous' value /i = 2, the weight of the tails becomes 
more and more important. For /i < 2, the whole argument collapses 
since the weight of the tails would grow with N . In this case, however, 
the convergence is no longer towards the Gaussian, but towards the Levy 
distribution of exponent ^. 

1.6.5 Truncated Levy distributions 

An interesting case is when the elementary distribution Pi{xi) is a trun- 
cated Levy distribution (tld) as defined in Section 1.3.3. The first 



cumulants of the distribution defined by Eq. (1.23) read, for 1 < /i < 2 



The kurtosis /« = A4 = C4 is given by 



^^^{i-j^pj^^cos^^ (1.95) 
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Note that the case fi = 2 corresponds to the Gaussian, for which A4 = 
as expected. On the other hand, when a —^ 0, one recovers a pure Levy 
distribution, for which C2 and C4 are formally infinite. Finally, if a — > c» 
with a^a^~^ fixed, one also recovers the Gaussian. 

If one considers the sum of N random variables distributed according 
to a TLD, the condition for the CLT to be vahd reads (for ^ < 2):0 

N :$> N* ^ Xi {Na^,)T: -» a-\ (1.96) 

This condition has a very simple intuitive meaning. A tld behaves very 
much like a pure Levy distribution as long as a; a^^. In particular, 
it behaves as a power-law of exponent and tail amplitude cx 
in the region where x is large but still much smaller than (we thus 
also assume that a is very small). If N is not too large, most values 
of X fall in the Levy-like region. The largest value of x encountered is 
thus of order Xmax — AN~ (cf. 1.42| ). If aimax is very small compared 



to a ^ , it is consistent to forget the exponential cut-off and think of the 
elementary distribution as a pure Levy distribution. One thus observe a 
first regime in N where the typical value of X grows a.s N 1^ , as if a was 
zero.^ However, as illustrated in Fig. |1.9| , this regime ends when Xmax 
reaches the cut-off value a^^: this happens precisely when N is on the 
order of TV* defined above. For N > N*, the variable X progressively 
converges towards a Gaussian variable of width y/N, at least in the region 
where |x| <C aN^/*/N*^/*. The typical amplitude of X thus behaves (as 



a function of N) as sketched in Fig. I.E. Notice that the asymptotic 
part of the distribution of X (outside the central region) decays as an 
exponential for all values of N. 



1.6.6 Conclusion: survival and vanishing oj tails 

The CLT thus teaches us that if the number of terms in a sum is large, 
the sum becomes (nearly) a Gaussian variable. This sum can represent 
the temporal aggregation of the daily fluctuations of a financial asset, 
or the aggregation, in a portfolio, of different stocks. The Gaussian (or 
non-Gaussian) nature of this sum is thus of crucial importance for risk 
control, since the extreme tails of the distribution correspond to the 
most 'dangerous' fluctuations. As we have discussed above, fluctuations 

^^One can see by inspection that the other conditions, concerning higher order 
cumulants, and which read Af*~^A2fc ^ 1, are actually equivalent to the one written 
here. 

■^^Note however that the variance of X grows like N for all A'^. However, the variance 
is dominated by the cut-ofT and, in the region N <C N* , grossly overestimates the 
typical values of X — see Section 2.3.2. 
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Figure 1.9: Behaviour of the typical value of X as a function of A'^ for tld 
variables. When N -C N* , x grows A'^'' (dotted line). When N ~ N* , x 
reaches the value a^^ and the exponential cut-off starts being relevant. When 
A'' N* , the behaviour predicted by the Clt sets in, and one recovers x oc 
(plain line). 
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are never Gaussian in the far-tails: one can explicitly show that if the 
elementary distribution decays as a power-law (or as an exponential, 
which formally corresponds to /i = oo), the distribution of the sum 
decays in the very same manner outside the central region, i.e. much more 
slowly than the Gaussian. The CLT simply ensures that these tail regions 
are expelled more and more towards large values of X when N grows, 
and their associated probability is smaller and smaller. When confronted 
to a concrete problem, one must decide whether N is large enough to 
be satisfied with a Gaussian description of the risks. In particular, if 
N is less than the characteristic value N* defined above, the Gaussian 
approximation is very bad. 

1.7 Correlations, dependence and non-stationary 

models (*) 

We have assumed up to now that the random variables where indepen- 
dent and identically distributed. Although the general case cannot be 
discussed as thoroughly as the IID case, it is useful to illustrate how the 
CLT must be modified on a few examples, some of which being particu- 
larly relevant in the context of financial time series. 

1.7.1 Correlations 

Let us assume that the correlation function Ci_j (defined as (xiXj) — m^) 
of the random variables is non zero for i ^ j. We also assume that 
the process is stationary, i.e. that Cij only depends on |i — j|: Cij = 
C{\i — with C(oo) = 0. The variance of the sum can be expressed in 
terms of the matrix C as:^ 

N ^ f 

where = C(0). From this expression, it is readily seen that if C{t} 
decays faster than l/€ for large the sum over i tends to a constant for 
large N, and thus the variance of the sum still grows as N, as for the 
usual CLT. If however C{1) decays for large £ as a power-law , with 
V <\, then the variance grows faster than N , as N'^~'^ - correlations thus 
enhance fluctuations. Hence, when v < 1, the standard CLT certainly 
has to be amended. The problem of the limit distribution in these cases 
is however not solved in general. For example, if the Xi are correlated 

^*We again assume in the following, without loss of generality that the mean m is 
zero. 
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Gaussian variables, it is easy to show that the resulting sum is also 
Gaussian, whatever the value of u. Another solvable case is when the Xi 
are correlated Gaussian variables, but one takes the sum of the squares of 
the Xj's. This sum converges towards a Gaussian of width whenever 
u > 1/2, but towards a non trivial limit distribution of a new kind (i.e. 
neither Gaussian nor Levy stable) when v < 1/2. In this last case, the 
proper rcscaling factor must be chosen as N^~'^ . 

One can also construct anti- correlated random variables, the sum 
of which grows slower than y/N. In the case of power-law correlated 
or anticorrelated Gaussian random variables, one speaks of 'fractional 
Brownian motion'. This notion was introduced by Mandelbrot and Van 
Ness [Mandelbrot]. 



It may happen that the distribution of the elementary random variables 
Pi{xi), P2{x2), ...,Pn{xn) are not all identical. This is the case, for ex- 
ample, when the variance of the random process depends upon time - in 
financial markets, it is a well known fact that the daily volatility is time 
dependent, taking rather high levels in periods of uncertainty, and revert- 
ing back to lower values in calmer periods. For example, the volatility 
of the bond market has been very high during 1994, and decreased in 
later years. Similarly, the volatility of stock markets has increased since 
August 1997. 

If the distribution Pk varies sufficiently 'slowly', one can in principle 

measure some of its moments (for example its mean and variance) over 
a time scale which is long enough to allow for a precise determination 
of these moments, but short compared to the time scale over which Pk 
is expected to vary. The situation is less clear if P/, varies 'rapidly'. 
Suppose for example that Pk{xk) is a Gaussian distribution of variance 
(7^, which is itself a random variable. We shall denote as (...) the av- 
erage over the random variable at, to distinguish it from the notation 
(...)fc which we have used to describe the average over the probability 
distribution Pk- If Cfc varies rapidly, it is impossible to separate the two 
sources of uncertainty. Thus, the empirical histogram constructed from 
the series {xi,X2, ■■■Xn} leads to an 'apparent' distribution P which is 
non-Gaussian even if each individual Pk is Gaussian. Indeed, from: 



1.7.2 Non stationary models and dependence 




(1.98) 
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one can calculate the kurtosis of P as: 



(1- 



Since for any random variable one has tr^ > (cr^)^ (the equality being 
reached only if cr^ does not fluctuate at all), one finds that k is always 
positive. The volatility fluctuations can thus lead to 'fat tails'. More 
precisely, let us assume that the probability distribution of the RMS, 
P(ct), decays itself for large a as exp— cr'^, c > 0. Assuming Pk to be 



Gaussian, it is easy to obtain, using a saddle point method (cf. (1.75)), 
that for large x one has: 

log[P(a;)] cx -x^. (1.100) 

Since c < 2 + c, this asymptotic decay is always much slower than in 
the Gaussian case, which corresponds to c — *■ c». The case where the 
volatility itself has a Gaussian tail (c = 2) leads to an exponential decay 
of P(x). 

Another interesting case is when cr^ is distributed as an completely 
asymmetric Levy distribution (/3 = 1) of exponent /i < 1. Using the 
properties of Levy distributions, one can then show that P is itself a 
symmetric Levy distribution (/3 = 0), of exponent equal to 2yLt. 

If the fluctuations of at are themselves correlated, one observes an 
interesting case of dependence. For example, if Ufe is large, au+i will 
probably also be large. The fluctuation Xk thus has a large probability 
to be large (but of arbitrary sign) twice in a row. We shall often refer, in 
the following, to a simple model where Xk can be written as a product 
EfeCTfe, where Ck are IID random variables of zero mean and unit variance, 
and ak corresponds to the local 'scale' of the fluctuations, which can be 
correlated in time. The correlation function of the Xk is thus given by: 



{xiXj) = atO-j{eiej) = Si^jcr'^. (1.101) 

Hence the Xk are uncorrelated random variables, but they are not inde- 
pendent since a higher order correlation function reveals a richer struc- 
ture. Let us for example consider the correlation of X'^: 



(x^x^) - {xl) {x^) = ^2^2 - af a| (^ ^ j), (1-102) 

which indeed has an interesting tempor al beh aviour: see Section 2.4jf] 
However, even if the correlation function <jf(j'j — a^ decreases very slowly 

Note that for i ^ j this correlation function can be zero cither because a is 
identically equal to a certain value co i or because the fluctuations of cr are completely 
uncorrelated from one time to the next. 
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with |i — j|, one can show that the sum of the X)., obtained as J2k=i ^''O'k 
is still governed by the CLT, and converges for large N towards a Gaussian 
variable. A way to sec this is to compute the average kurtosis of the sum, 
Kjv- As shown in Appendix A, one finds the following result: 



1 

Kn = 



N 



«o + (3 + «o)5(0) + 6 5^(1 - -)5 W 
^=1 



(1.103) 



where kq is the kurtosis of the variable e, and g{t) the correlation function 
of the variance, defined as: 



a\a']-a' = a'g{\i-3\) (1.104) 

It is interesting to see that iox N = 1, the above formula gives ki = 

Ko + (3 + Ko).g(O) > Ko, which means that even if kq = 0, a fluctuating 
volatility is enough to produce some kurtosis. More importantly, one sees 
that if the variance correlation function g(f) decays with I, the kurtosis 
kat tends to zero with TV, thus showing that the sum indeed converges 
towards a Gaussian variable. For example, if g(€) decays as a power-law 
l~" for large I, one finds that for large N: 

Kn oc for u > 1; kn oc for v < 1. (1.105) 

Hence, long-range correlation in the variance considerably slows down 
the convergence towards the Gaussian. This remark will be of impor- 
tance in the following, since financial time series often reveal long-ranged 
volatility fluctuations. 



1.8 Central limit theorem for random matrices (*) 

One interesting application of the CLT concerns the spectral properties of 
'random matrices'. The theory of Random Matrices has made enormous 
progress during the past thirty years, with many applications in physical 
sciences and elsewhere. More recently, it has been suggested that random 
matrices might also play an important role in finance: an example is 
discussed in Section 2.7. It is therefore appropriate to give a cursory 
discussion of some salient properties of random matrices. The simplest 
ensemble of random matrices is one where a all elements of the matrix 
H are IID random variables, with the only constraint that the matrix 
be symmetrical {Hij = Hji). One interesting result is that in the limit 
of very large matrices, the distribution of its eigenvalues has universal 
properties, which are to a large extent independent of the distribution 



Theory of Financial Risk, © Science & Finance 1999. 



1.8 Central limit theorem for random matrices (*) 



45 



of the elements of the matrix. This is actuaUy the consequence of the 
CLT, as we will show below. Let us introduce first some notations. The 

matrix H is a sqiiaro, N x N symmetric matrix. Its eigenvalues are Aq,, 
with a = 1, N. The density of eigenvalues is defined as: 



pW = ^^H>^-K), (1.106) 

a = l 

where 6 is the Dirac function. We shall also need the so-called 'resolvent' 
G(A) of the matrix H, defined as: 

Gi,(A)^(^^^^) , (1.107) 

where 1 is the identity matrix. The trace of G(A) can be expressed using 
the eigenvalues of H as: 



=1 



The 'trick' that allows one to calculate p{X) in the large A'' limit is the 
following representation of the S function: 

^ PP-+iTT6{x) (e-*0), (1.109) 



X — te X 

where PP means the principal part. Therefore, p(A) can be expressed 
as: 

/9(A) = lim-9(TrG(A-ie)). (1.110) 

e—>0 TT 

Our task is therefore to obtain an expression for the resolvent G(A). 
This can be done by establishing a recursion relation, allowing one to 
compute G(A) for a matrix H with one extra row and one extra column, 
the elements of which being Hoi. One then computes Gqq^^{X) (the su- 
perscript stands for the size of the matrix H) using the standard formula 
for matrix inversion: 

r'N+in\ minor(Al - H)oo 

^^>- det(Al-H) ■ (^-"^^ 

Now, one expands the determinant appearing in the denominator in 

minors along the first row, and then each minor is itself expanded in 
subminors along their first column. After a little thought, this finally 
leads to the following expression for Gqq'^{X): 



N 

X-Hoo-J2 HoiHojG^jiX). (1.112) 
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This relation is general, without any assumption on the Hij. Now, we 
assume that the -ffy 's are IID random variables, of zero mean and vari- 
ance equal to {Hf^) — cr'^/N. This scaling with N can be understood as 
follows: when the matrix H acts on a certain vector, each component 
of the image vector is a sum of N random variables. In order to keep 
the image vector (and thus the corresponding eigenvalue) finite when 
TV — > cxD, one should scale the elements of the matrix with the factor 

1/Vn. 

One could also write a recursion relation for G^^^, and establish 



self-consistently that dj ^ l/v N for i ^ j. On the other hand, due to 
the diagonal term A, Gu remains finite for N — > oo. This scaling allows 
us to discard all the terms with i ^ j in the sum appearing in the right 
hand side of Eq. ( 1.112| ). Furthermore, since Hqq ^ this term can 



be neglected compared to A. This finally leads to a simplified recursion 
relation, valid in the limit iV — )■ oo: 



N 



„Ar+l,,^— ^ -f^w^'ij (A). (1.113) 

Now, using the CLT, we know that the last sum converges, for large N, 
towards a'^/N J2iLiG^{X)- This result is independent of the precise 
statistics of the i/oi, provided their variance is finite.^ This shows that 
Goo converges for large N towards a well defined limit Goo , which obeys 
the following limit equation: 

^ X-a^Goo{X). (1.114) 



Geo (A) 

The solution to this second order equation reads: 

Goo(A) = 7^\^- - 4(72] . (1.115) 

(The correct solution is chosen to recover the right limit for cr = 0.) 
Now, the only way for this quantity to have a non zero imaginary part 
when one adds to A a small imaginary term ie which tends to zero is 
that the square root itself is imaginary. The final result for the density 
of eigenvalues is therefore: 

p(X) = -l-v/4a2 - A2 for lAI < 2a, (1.116) 
zncr-' 



■^^The case of Levy distributed Hij's with infinite variance has been investigated 
in: P. Cizeau, J. -P. Bouchaud, "Theory of Levy matrices", Phys. Rev. E 50 1810 
(1994). 
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and zero elsewhere. This is the well known 'semi-circle' law for the 
density of states, first derived by Wigner. This result can be obtained 
by a variety of other methods if the distribution of matrix elements is 
Gaussian. In finance, one often encounters correlation matrices C, which 
have the special property of being positive definite. C can be written as 
C = HH^^, where H^^ is the matrix transpose of H. In general, H is a 
rectangular matrix of size M x N. In Chapter 2, M will be the number 
of asset, and N, the number of observation (days). In the particular case 
where N = AI, the eigenvalues of C are simply obtained from those of 
H by squaring them: 

Ac-A|,. (1.117) 

If one assumes that the elements of H are random variables, the 
density of eigenvalues of C can easily be obtained from: 

p{Xc)dXc ^ piXH)dXH, (1.118) 

which leads to: 



P^^'^^^lh^^^^-X^ forO<Ac<4a^ (1.119) 

and zero elsewhere. For N ^ M, a similar formula exists, which we 
shall use in the following. In the limit N,M oo, with a fixed ratio 
Q = N/M > 1, one has:0 



N Q V(-^max - Ac) (Ac ■ 

p(Ac) = 



= <y'(l + l/Q±2^/T/Q), (1.120) 

with A S [Amin, Amax]. This form is actually also valid for Q < 1, except 
that there appears a finite fraction of strictly zero eigenvalues, of weight 
1-Q. 

The most important features predicted by Eq. (1.120) are: 

• The fact that the lower 'edge' of the spectrum is strictly positive 
(except for Q — 1); there is therefore no eigenvalues between and 
Amin- Near this edge, the density of eigenvalues exhibits a sharp 
maximum, except in the limit Q — 1 (Amin — 0) where it diverges 
as ~ 1/\/A. 



^'^see: A. Edelmann, "Eigenvalues and condition numbers of random matrices," 
SIAM J. Matrix Anal. Appl. 9, 543 (1988). 
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Figure 1.10: Graph of Eq. ( |l.l20| ) for Q=l, 2 and 5 



• The density of eigenvalues also vanishes above a certain upper edge 

Note that all the above results are only valid in the limit N oo. For 
finite N, the singularities present at both edges are smoothed: the edges 
become somewhat blurred, with a small probability of finding eigenvalues 
above Amax and below Amin, which goes to zero when TV becomes large. 

In Chapter 2, we will compare the empirical distribution of the eigen- 
values of the correlation matrix of stocks corresponding to different mar- 
kets with the theoretical prediction given by Eq. ( 1.12C| ) . 



1.9 Appendix A: non-stationarity and anomalous 

kurtosis 



In this appendix, we calculate the average kurtosis of the sum ^^i-i 
assuming that the 5xi can be written as aiti. The UiS are correlated as: 



{Dk-D){D,-D)=.D g{\i^k\) Dk ^ al (1.121) 

Let us first compute ^^i^ where (...) means an average over 

the e^'s and the overline means an average over the cr^'s. If (e^) = 0, and 

^*see e.g. M. J. Bowick, E. Brezin, "Universal scaling of the tails of the density of 
eigenvalues in random matrix models," Phys. Lett B268, 21 (1991). 
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{eiCj) = for i ^ j , one finds: 



N 



N 



N 



SxiSxjSxkSxi \ \^'^{Sxf)+3 ^ {5x1) {52 



N N 

1=1 j#J=l 



(1.122) 



where we have used the definition of (the kurtosis of e) . On the other 

-2 



hand, one must estimate ( wli^iSxA ) . One finds 



N 



N 



(1.123) 



Gathering the different terms and using the definition (1.121), one finally 
establishes the following general relation: 



1 



or: 



N^D 



N 



ND (3 + Ko)(l+.g(0))-37Vi? +3D 9i\i - j\ 

(1.124) 



N 



N 



kq 



(3 + «:„).9(0) + 6^(l--)5(^) 



e=i 



(1.125) 
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